Relation Schema Induction using Tensor Factorization with Side Information
نویسندگان
چکیده
Given a set of documents from a specific domain (e.g., medical research journals), how do we automatically identify the schema of relations, i.e., type signature of arguments of relations (e.g., undergo(Patient, Surgery)) – a necessary first step towards building a Knowledge Graph (KG) out of the given set of documents? We refer to this problem as Relation Schema Induction (RSI). While Open Information Extraction (OIE) techniques aim at extracting surface-level text triples of the form (John, underwent, Angioplasty), they don’t induce the yet unknown schema of the relations themselves. Tensors provide a natural representation for such triples, and factorization of such tensors provide a plausible solution for the RSI problem. To the best of our knowledge, tensor factorization methods have not been used for the RSI problem. We fill this gap and propose Coupled Non-Negative Tensor Factorization (CNTF), a tensor factorization method which is able to incorporate additional side information in a principled way for more effective Relation Schema Induction. We report our findings on multiple real-world datasets and demonstrate CNTF’s effectiveness over state-of-the-art baselines both in terms of accuracy and speed. We hope to make all datasets and code publicly available upon publication of the paper.
منابع مشابه
Event Schema Induction using Tensor Factorization with Back-off
The goal of Event Schema Induction (ESI) is to identify schemas of events1 from a corpus of documents. For example, given documents from the sports domain, we would like to infer that win(WinningPlayer, Trophy, OpponentPlayer, Location) is an important event schema for this domain. Automatic discovery of such event schemas is an important first step towards building domain-specific Knowledge Gr...
متن کاملTowards Combined Matrix and Tensor Factorization for Universal Schema Relation Extraction
Matrix factorization of knowledge bases in universal schema has facilitated accurate distantlysupervised relation extraction. This factorization encodes dependencies between textual patterns and structured relations using lowdimensional vectors defined for each entity pair; although these factors are effective at combining evidence for an entity pair, they are inaccurate on rare pairs, or for r...
متن کاملGeneric, network schema agnostic sparse tensor factorization for single-pass clustering of heterogeneous information networks
Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most stu...
متن کاملHighly Scalable Tensor Factorization for Prediction of Drug-Protein Interaction Type
The understanding of the type of inhibitory interaction plays an important role in drug design. Therefore, researchers are interested to know whether a drug has competitive or non-competitive interaction to particular protein targets. Method: to analyze the interaction types we propose factorization method Macau which allows us to combine different measurement types into a single tensor togethe...
متن کاملA social recommender system based on matrix factorization considering dynamics of user preferences
With the expansion of social networks, the use of recommender systems in these networks has attracted considerable attention. Recommender systems have become an important tool for alleviating the information that overload problem of users by providing personalized recommendations to a user who might like based on past preferences or observed behavior about one or various items. In these systems...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016